Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove inspect.stack() from @rule parsing to fix import time regression in py3 #7447

Merged

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Mar 27, 2019

Problem

Fixes #7373, fixes #7442. inspect.stack() in the @rule decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line:

caller_frame = inspect.stack()[1][0]

In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an inspect.stack() in the @rule decorator was taking up 66% of the import time:

Screen Shot 2019-03-27 at 14 38 53 (2)

This profile was generated via applying the following diff:

diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca1..7510e6e 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -101,7 +101,9 @@ class PantsLoader(object):
   def load_and_execute(entrypoint):
     assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
     module_path, func_name = entrypoint.split(':', 1)
-    module = importlib.import_module(module_path)
+    from pants.util.contextutil import maybe_profiled
+    with maybe_profiled('import_module_profile.prof'):
+      module = importlib.import_module(module_path)
     entrypoint_main = getattr(module, func_name)
     assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
     entrypoint_main()

then running ./pants goals to produce import_module_profile.prof, then running snakeviz import_module_profile.prof. snakeviz must be installed with pip3 for some reason.

Solution

  • Instead of getting the current interpreter frame with inspect.stack(), get the decorated function's module from sys.modules, and use that to obtain the type-valued arguments of a Get call during @rule parsing.

Result

./pants goals went from ~4.8 seconds to ~2.3 seconds, while ./pants2 goals went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for Get() arguments are resolved from the @rule function's module should be unchanged.

Explanation

@Eric-Arellano noted that the return type of inspect.stack() changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it.

@jsirois ran a quick experiment:

$ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done
python2.7: 2.63740515709
python3.4: 3.8067204039980425
python3.5: 4.471806422996451
python3.6: 4.527370049007004
python3.7: 3.3335486440046225

Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing native_toolchain.py is significantly faster than when it is imported indirectly.

A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889):

per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. This is the expensive step.

Apparently, inspect.stack(0) might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change.

TODO

Make a python bug about this regression if it isn't already known.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phenomenal! Great work figuring this out.

Some suggestions for PR description:

  1. Add command you ran to generate the snakeviz visualization. Thanks for including the diff already.
  2. With the proportional comment, reference the section "Foil in this hypothesis: importing directly is fine" from V2 rule resolution causing substantial startup performance hit at import time #7442. I think this proportional idea explains what is going on.

@cosmicexplorer
Copy link
Contributor Author

@Eric-Arellano I believe I have addressed both of your suggestions!

@Eric-Arellano Eric-Arellano added this to the 1.15.x milestone Mar 28, 2019
@cosmicexplorer cosmicexplorer changed the title remove inspect.stack() from @rule parsing to fix import time regression remove inspect.stack() from @rule parsing to fix import time regression in py3 Mar 28, 2019
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cosmicexplorer cosmicexplorer merged commit 04a9079 into pantsbuild:master Mar 28, 2019
stuhood pushed a commit that referenced this pull request Mar 29, 2019
…on in py3 (#7447)

### Problem

Fixes #7373, fixes #7442. `inspect.stack()` in the `@rule` decorator was responsible for the majority of time taken during import when running pants in python 3, in particular, this line: https://github.com/pantsbuild/pants/blob/6739a81b6008274bfb4abd3436d9adb543a19e5a/src/python/pants/engine/rules.py#L245

In the below image a snakeviz profile was created of the import statement which brings in all our v2 rules, showing that an `inspect.stack()` in the `@rule` decorator was taking up 66% of the import time:

<img width="1744" alt="Screen Shot 2019-03-27 at 14 38 53 (2)" src="https://user-images.githubusercontent.com/1305167/55118058-3a0b6b00-50aa-11e9-9343-bcc2ca0e53d2.png">

This profile was generated via applying the following diff:
```diff
diff --git a/src/python/pants/bin/pants_loader.py b/src/python/pants/bin/pants_loader.py
index 91a4ca1..7510e6e 100644
--- a/src/python/pants/bin/pants_loader.py
+++ b/src/python/pants/bin/pants_loader.py
@@ -101,7 +101,9 @@ class PantsLoader(object):
   def load_and_execute(entrypoint):
     assert ':' in entrypoint, 'ERROR: entrypoint must be of the form `module.path:callable`'
     module_path, func_name = entrypoint.split(':', 1)
-    module = importlib.import_module(module_path)
+    from pants.util.contextutil import maybe_profiled
+    with maybe_profiled('import_module_profile.prof'):
+      module = importlib.import_module(module_path)
     entrypoint_main = getattr(module, func_name)
     assert callable(entrypoint_main), 'ERROR: entrypoint `{}` is not callable'.format(entrypoint)
     entrypoint_main()
```
then running `./pants goals` to produce `import_module_profile.prof`, then running `snakeviz import_module_profile.prof`. `snakeviz` must be installed with `pip3` for some reason.

### Solution

- Instead of getting the current interpreter frame with `inspect.stack()`, get the decorated function's module from `sys.modules`, and use that to obtain the type-valued arguments of a `Get` call during `@rule` parsing.

### Result

`./pants goals` went from ~4.8 seconds to ~2.3 seconds, while `./pants2 goals` went from ~2.3 seconds to ~2.2 seconds. The semantics of how types for `Get()` arguments are resolved from the `@rule` function's module *should* be unchanged.

### Explanation
@Eric-Arellano noted that the return type of `inspect.stack()` changed in python 3.5: https://docs.python.org/3/library/inspect.html#inspect.stack, which may have contributed to it.

@jsirois ran a quick experiment:
```
$ for py in python{2.7,3.{4,5,6,7}}; do echo -n "$py: " && $py -c "import timeit; print(timeit.timeit('import inspect; inspect.stack()', number=10000))"; done
python2.7: 2.63740515709
python3.4: 3.8067204039980425
python3.5: 4.471806422996451
python3.6: 4.527370049007004
python3.7: 3.3335486440046225
```

Note that the original line of code was getting the whole stack just to get the one containing the decorated method, so it's likely that this slowdown is proportional to depth (or worse). See the section "Foil in this hypothesis" in #7442 for some behavior which may bear out the hypothesis that slowdown is proportional to stack depth, e.g. directly importing `native_toolchain.py` is significantly faster than when it is imported indirectly.

A StackOverflow answer describes that filesystem access in generating each stack frame may be the culprit (https://stackoverflow.com/a/17407257/2518889):
> per frame, collect the filename, linenumber, and source file context (the source file line plus some extra lines around it if requested). The latter requires reading the source file for each stack frame. *This is the expensive step.*

Apparently, `inspect.stack(0)` might be usable to avoid this file access, but since the resulting code is able to avoid deep introspection anyway, it seems reasonable to keep this change.

### TODO

Make a python bug about this regression if it isn't already known.
Eric-Arellano added a commit that referenced this pull request Mar 31, 2019
This test no longer fails with Python 3. It was likely fixed by #7447, as the test was timing out originally.

All Python 3 integration tests were ran three times to ensure we aren't adding a flaky test.
Eric-Arellano added a commit that referenced this pull request Mar 31, 2019
…ctor their BUILD entries to be more granular (#7463)

### Problem
Currently, you can only run `./pants test tests/python/pants_test/backend/python/tasks:tasks` or `./pants test tests/python/pants_test/backend/python/tasks:integration`. 

This lack of granularity makes it difficult to develop in this folder, as the former has a test timeout of 10 minutes and the latter of 40 minutes. While you can use `./pants test.pytest -- -k test_name`, this does not get the specific file but rather the names of individual tests.

It also makes it harder to tell which tests are causing the long timeouts for this folder.

### Solution
Each test file gets its own BUILD entry, as we do in most folders.

#### Also remove tests from Py3 blacklist
All tests now pass with Python 3. This is hypothesized to be thanks to:
1) Performance improvement thanks to #7447. As we recall, some of the tests were timing out, and this performance fix reduces the risk of timeouts.
2) Skipping Pexrc-related tests thanks to #7285.

Note all integration tests were ran four times in CI to check for flakiness.

#### Also refactor `test_python_run_integration.py`'s interpreter selection
Deduplicate the constraints and update the constraints to reflect our modern interpreter usage. For example, currently Py37 may not be chosen, which means our cron job would potentially fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants